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ABSTRACT 


This paper addresses questions of data sharing from the perspective of a former NSF program 
officer. A brief comparison of policy and research perspectives is made to highlight different values in 
these two communities. Data sharing is framed as one means to support dialog between researchers 
and those involved in policy. Other uses of data sharing to those involved in policy are then outlined; 
it is seen that the role of data sharing is growing in a policy landscape in which program evaluation 
is becoming increasingly important. Despite the desire of Federal agencies to support data sharing 
there are legal, regulatory, and ethical constraints and tensions; these are briefly outlined. To help 
navigate these tensions data sharing is framed in terms of three value propositions: help tell stories, 
create usable ontologies, and support networks. Each of these value propositions is discussed and 
several heuristics are proposed that may help emerging data sharing efforts have more value for 
those who serve engineering education in a policy role. The paper concludes by exploring the larger 
question of data at scale, the need to connect data across scales, and the philosophical issues that 
underlie any future attempts to share data. 
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INTRODUCTION 


"...the America we have taken for granted for more than a generation is changing. Our society is 
changing: More people are old, fewer are young, more come from minority groups. Our industry 
is changing: We are not the world economic leader we were for so long, but a competitor 
with other industrial nations. Our education system is changing: Although our colleges and 
universities are the envy of the world, they are becoming more and more dependent on foreign 
students and faculty. Our precollege education system has reached a crisis state in which U.S. 
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students are no longer competitive with those in other industrialized countries. Our present 
scientific and engineering workforce-the foundation for U.S. technological, economic, and 
military leadership is eroding due to retirements and declining student interest.” 

The above quote is drawn from a national level report on engineering education. From the content 
can you determine how many years ago the report was published? While the answer to this question 
is obviously trivia, it is not irrelevant to the larger question of data sharing. Such quotes are often used 
justify requests for engineering education funding, and researchers often cite such reports to highlight 
the potential impact of their work. Such reports, however, are generally not written exclusively for engi¬ 
neering education researchers, but also for policy makers with the purpose of informing policy decisions. 

Policy differs from research in important ways. While research frames and answers questions, 
policy is used to guide decisions to achieve “better” outcomes than might otherwise occur. Success 
in research comes from framing questions in a sufficiently narrow way such that they can be defini¬ 
tively answered using data. Policy decisions on the other hand, particularly those at upper levels of 
government, are used to make uncertain choices across very broad domains. Despite these differ¬ 
ences, however, there are close relationships between research and policy. Research impacts policy 
since the understanding gained through research, particularly policy-focused research, can lead to 
better outcomes. Policy similarly impacts research through influencing agency budgets, which in turn 
impact research priorities. These relationships are not simple (Stephan 2012). Although the results 
of research can impact policy, policy makers’ breadth of purview means that decisions are greatly 
affected by which sources of information policy makers access, how that information is framed, and 
how trusted the source of information is. Thus a purpose of reports like the one quoted above is to 
frame information in a way that is vital, trusted and accessible. 

Bruner, building upon Vygotsky, identifies two modes of cognitive function, rational and narra¬ 
tive, and claims that we make sense of lived reality through stories as much as reason. Each mode 
has its own logic that implies causality but while rational modes require proof, narrative modes are 
based on likeliness, alignment with one’s past experience, situatedness, and connection to others. 
Bruner argues that we construct stories about alternative realities in an ongoing, recursive process 
that leads to valid, generalized abstractions about the world (Bruner 1987). Policy decisions are un¬ 
certain and have human consequences, and thus are based on both rational and narrative modes of 
understanding. In other words, policy is informed by stories, particularly compelling human stories 
based on believable logic models that show an impact on issues important to policy makers such 
as jobs, economic growth, or security. Thus reports like the one quoted above often use a tone of 
crisis as a literary device to help make the story more compelling (Lucena 2005). Both rational and 
narrative modes are important; while stories can sway policy decisions, research and data help sus- 


2 


SPRING 2016 





ADVANCES IN ENGINEERING EDUCATION 

Data Sharing from a Policy Perspective 



tain policies by proving or disproving their effectiveness. This paper explores interrelations between 
policy and research, the role of data sharing in data-driven evaluation of funding programs, and how 
data sharing can help the engineering education community tell compelling stories. 


DATA SHARING FROM A PROGRAM OFFICER’S PERSPECTIVE 


There are many possible perspectives on the relative value of data sharing. In this paper I share 
one viewpoint developed during the time I served as an NSF program officer responsible for fund¬ 
ing engineering education research, from summer 2010 to winter 2012. The writing in this article 
will purposely adopt a first person perspective to make clear that I am expressing my own views, 
not those of NSF or other program officers. 

The experience of being a program officer was in many ways transformative, and looking back 
it is clear I entered NSF with many misconceptions about what my job was to be. My naivety is not 
surprising since in hindsight there is not widespread understanding among many of those who 
apply to NSF for funding about the larger policy roles NSF plays. Like many others I viewed NSF 
primarily as an academic funding source whose main role was rating and funding proposals, and did 
not really appreciate the relationship between funding programs and larger policy goals. For this 
reason much of my time at NSF was spent learning the multi-dimensional role that program officers 
play in the larger research ecosystem. To frame the discussion of data sharing, it is first necessary 
to explain some of the day to day responsibilities of a program officer and my growing sense of 
the differences and similarities between my academic and policy lives. The goal of this section is 
to allow a reader to partially step into the role of a program officer to better understand how data 
sharing might impact how program officers manage their funding programs. 

The interaction most researchers have with program officers (POs) is through the grant funding and 
management process. NSF, however, is a federally funded government agency with a congressionally 
defined mission: “To promote the progress of science; to advance the national health, prosperity, and 
welfare; to secure the national defense....” The legislation that authorizes and funds NSF supports other 
activities relevant to data sharing that include (National Science Foundation 2014): 

• Evaluate the status and needs of the various sciences and engineering and take into consider¬ 
ation the results of this evaluation in correlating our research and educational programs with 
other federal and non-federal programs, 

• Provide a central clearinghouse for the collection, interpretation and analysis of data on 
scientific and technical resources in the United States, and provide a source of information for 
policy formulation by other federal agencies. 
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• Recommend and encourage the pursuit of national policies for the promotion of basic research 
and education in the sciences and engineering. 

While making funding decisions and managing awards remained a large part of my day-to-day 
activities, as time went on this role shrank as a relative fraction of my responsibilities. The growing 
role was to serve as an interface between national needs and a research community. 

Information needs to flow in two directions across the policy-research interface. In communicating to 
potential Pis in the research community it was important to ensure my funding programs aligned with both 
emerging policy directions and the research being conducted in engineering education. Such alignment 
is most commonly articulated through program descriptions and program solicitations. Since NSF can 
only be as successful as the research it supports, an important aspect of my job was developing funding 
opportunities that both aligned with policy goals and attracted potentially transformative research ideas. 
To maintain program funding, however, information needs to flow from researchers to policy makers as 
well. In this role a program officer serves as a proxy who draws upon expertise from researchers to explain 
how federally funded research in their program contributes to larger national needs by reporting (usually 
indirectly) to the Office of Management and Budget (OMB) and Congress. Here “contribute” implies not 
just answering needs, but also helping to articulate them; this is an important point. Clearly articulating 
needs cannot be done by a program officer alone; it is a responsibility that must also be shared by a 
research community through, at a minimum, clearly expressing the broader impacts research can have. 
In my limited experience Pis who are successful over the span of a career seek to understand how to 
align their research, particularly the broad impacts that the research can have, with constantly shifting 
national priorities in ways that go beyond simply quoting Rising Above the Gathering Storm (Committee 
on Prospering in the Global Economy of the 21st Century 2006). Articulating national needs is neither 
simple to do nor is there consensus on what needs are. For this reason policy intersects with politics; while 
politics is beyond the scope of this paper it is a reality successful research communities must manage. 

The role of a program officer is more than making and declining awards. It includes developing 
funding programs, marketing the program both within and external to NSF, collaborating across 
divisions and directorates by serving on interdisciplinary working groups, answering queries about 
the program from other government entities in the executive or legislative branches, and through 
these tasks assisting in making policy decisions. 


DATA SHARING WILL BECOME MORE IMPORTANT AS ACCOUNTABILITY INCREASES 


A good analogy for my former role as a program officer is an investment manager who is tasked 
with maximizing the return on investment to clients, in this case the American public. Of course 
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measuring the return on research investments is still challenging (Stephan 2012). Over the last 
decades there have been slow but steady changes in how the long-term payoffs of Federal invest¬ 
ments are determined. Recently the shift has been from performance measurement—identifying 
outcomes and tracking progress towards those outcomes—to program evaluation that compares 
the outcomes of the program with similar alternatives or even taking no action (Office of Manage¬ 
ment and Budget 2014). The Office of Management and Budget increasingly asks Federal agencies 
to evaluate the effectiveness of their programs. As a recent White Flouse memo stated (Zients 
2012), “ Grant-making agencies should demonstrate that, between FY 2013 and FY2014, they are 
increasing the use of evidence in formula and competitive programs." Such top-down pressure 
causes programs to increasingly assess and evaluate their effectiveness and develop theories of 
change or logic models that articulate recognizable benefits. These policy changes eventually 
find their way down to Pis who are asked to share data or better clarify the broad impacts of their 
research. Policy changes have impact outside of government as well, and affect many aspects of 
one’s life. A controversial example, which has been criticized for lack of context, is the Depart¬ 
ment of Education’s College Scorecard (Department of Education 2014) that tracks measures of 
university performance. 

Clearly evaluating policies and programs can provide multiple benefits, from better allocation 

of scarce resources within agencies to answering questions such as “how does_create jobs 

in my district” in the political sphere. However there are some limitations to program evaluation. 
OMB is cognizant that good evaluation is resource intensive, so recent budgets provide resources 
for evaluation (Burwell et al. 2013) that must be reallocated from other sources. Additionally you 
can only evaluate what you look for, so too much focus on evaluation potentially limits innovation. 
Multi-tiered models that invest both in proven interventions as well as more speculative ideas are 
promoted by OMB (Office of Management and Budget 2014). A more subtle criticism is that evaluation 
often depends upon linear logic models or theories of change which may be inaccurate for nuanced 
and complex systems like education where small changes in context can make large differences in 
outcomes (Tough 2013). In agencies like NSF that are not as mission oriented as, for example, NASA 
or DoD, such models have significant limitations. For example, the return on investments in educa¬ 
tion is not isolated from other Federal programs. Academic opportunities increasingly depend on 
socio-economic status and experiences in early childhood, which in turn depend on programs such 
as Head Start and decades-old policy choices about social vs. military spending. Because no single 
evaluation program can provide the breadth of data needed for policy decisions, evidence framed 
so as to be widely understandable that is backed by a compelling story is highly valued. 

It is a truism to state that political values and ideologies change over time. Over the last decade public 
dialog around higher education has increasingly been framed negatively, leading to a perception that 
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investments in higher education are ineffective 1 . One impact of such perceptions was that my time at 
NSF was increasingly spent evaluating my portfolio of awards (many of which were made by my pre¬ 
decessors) to accurately represent the overall impacts to higher level policy makers. Claiming impact is 
difficult. The standard of proof and the consequences of providing incorrect or misleading information 
are different in the government than in academia. Information is generally verified in a more hierarchi¬ 
cal fashion than academic peer review. Similarly while academics have little incentive and few channels 
to publish negative results, the more procedural approaches in government seek to capture such data. 
At NSF, as in any organization, evaluation that confirms conventional wisdom is more easily accepted. 


ISSUES AND TENSIONS OF DATA SHARING 


The above discussion of the Federal research funding side of engineering education, based upon 
my admittedly limited experience as an NSF program officer, illustrates how access to data is useful 
to program officers. The extent to which a research community transparently shares data funded 
by research awards will impact a program director’s ability to evaluate the impact of their program 
or enable other individuals or agencies to perform such evaluations. 

Before discussing potential benefits and problems related to data sharing, it is important to clarify 
what is meant by data. Since a major function of government is to make laws, there is generally 
great care taken with the use of language so that data has, for the purposes of NSF, an operational 
definition: 

"... the recorded factual material commonly accepted in the scientific community as 
necessary to validate research findings, but not any of the following: preliminary analyses, 
drafts of scientific papers, plans for future research, peer reviews, or communications with 
colleagues. This ‘recorded’ material excludes physical objects (e.g., laboratory samples). 
Research data also do not include: 

(A) Trade secrets, commercial information, materials necessary to be held confidential by a 
researcher until they are published, or similar information which is protected under law; 
and 


1 Ideologies span the political spectrum and values are often influenced by ideas in circulation at any moment. A historical 
example is the Powell memo (Powell 1971) which influenced efforts to bring more free market ideologies into higher education 
(Demarrais 2006). More recent examples focus on rising costs (Kirshstein and Wellman 2012) and the idea of “disruption” which 
has entered higher education from business (Christensen et al. 2011) and is widely quoted despite recent critiques (LePore 2014). 
See also Peter Brooks’ commentary on, and review of, recent books painting higher education in crisis. (Brooks 2011) 
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(B) Personnel and medical information and similar information the disclosure of which would 
constitute a dearly unwarranted invasion of personal privacy, such that could be used to 
identify a particular person in a research study." (Office of Management and Budget 1999) 

While OMB provides the broad definition above, for this paper data can be succinctly defined as the 
information that emerges from a research study which is deemed to be true, important, and that has no 
legitimate reason for not being shared. The term “data” includes metadata that explains how the data 
was collected since the value of data itself depends upon an understanding of the context in which 
the data was generated. In engineering education this context is critical yet difficult to characterize. 

While NSF began to require a plan for data sharing as a two page addendum to proposals that 
would be rated as part of the intellectual merit in January 2011, various constraints make this policy 
difficult to implement. Since Federal agencies are required to follow the Government Performance 
Results Act (GPRA) which mandates efficient operation, NSF’s Grant Proposal Guide indicates that 
data sharing should not incur “...more than incremental cost...”. The question of how meaningful data 
sharing can be managed with only incremental cost remains unanswered. There remain ontological 
uncertainties with what constitutes data and metadata—notwithstanding OMB definitions—that 
need to be resolved within each discipline; recent work on ontologies by Cynthia Finelli (Finelli 
and Borrego 2014) and others may help in engineering education. There are additionally important 
ethical questions that arise at the interface of technology and privacy, regulatory and legal issues 
such as the Family Educational Rights and Privacy Act, the fact that IRB processes vary between 
institutions, and issues of the potential liability of researchers and institutions who share data that 
is used for unintended purposes. What “data” is varies greatly across the disciplines supported by 
NSF. Practices that work in particle physics—where sharing raw data can allow external validation 
and review of results or even new discoveries—may not work in engineering education. 


USING VALUE PROPOSITIONS AND HEURISTICS TO NAVIGATE ISSUES 


In my opinion, these real constraints mean that data sharing will not occur in a meaningful way 
if viewed by a research community as a Federal mandate. Furthermore, I consider it unlikely that a 
widely-accepted model that is effective for engineering education will be developed by NSF or another 
government agency. Given the need to develop best practices in data sharing that cover the wide 
spectrum of engineering education research approaches it will further be difficult, if not impossible, 
to frame data sharing in a rigorous or deontological way. For these reasons I adopt a value framework 
in exploring data sharing (Carlson and Wilmot 2006). Value can be roughly defined as the ratio of 
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the benefit to the cost, so from the value perspective the benefits of data sharing must be greater 
than the cost for multiple stakeholders including researchers, universities, and funding agencies. 
The intent of defining value is not to propose an exact economic calculus; both “benefit” and “cost” 
include intangibles. Rather it is to identify ways in which data sharing could create value and develop 
particular value propositions for data sharing guided, where possible, by actionable heuristics. 

The value of data sharing means different things to different organizations. Different sectors of 
the engineering education ecosystem—Federal agencies, large schools with significant resources, 
and small schools with more limited resources—will have different value propositions. Furthermore 
given the breadth of research approaches in the engineering education community, data sharing 
likely has very different connotations to researchers from different traditions. Since it is not pos¬ 
sible to represent all viewpoints I take the stance of an NSF program officer; my stance might be 
different in my role as a researcher, as department chair, or even had I served in the Education and 
Human Resources rather than the Engineering Directorate within NSF. In the remainder of the paper 
I discuss three ways that data sharing could potentially have created value to me as an NSF program 
officer. From this perspective I will try to identify both benefits—e.g. understanding how NSF fund¬ 
ing has increased the connectivity of PI networks—as well as costs such as program funds, time, 
or bureaucratic red tape. I will skip over the obvious ways shared data can improve the proposal 
review process such as finding panelists or judging the merits of a proposal since while these are 
important, they are lesser concerns in the larger realms of policy. 

Value Proposition #1; Help Tell Stories 

The first value proposition is that data sharing can better enable a program officer to tell sequen¬ 
tial, ideally causal, stories about the impact of their program on the larger engineering education 
system. Here the word “system” is used in the integrative, rather than reductionist, sense to capture 
the combination of people, programs, and institutions that support and credential engineering learn¬ 
ing. Such stories, where small investments lay the groundwork for long-term changes, are critical 
to explain the impact of research programs given the relatively small role NSF funding plays in the 
education system. In 2013 state funding for higher education varied from about $360M in Vermont 
to $14B in California with a median of about $1.8B (Staff of the State Higher Education Executive 
Officers 2013). Amounts are given in US Dollars with M = million, B = billion, and T = trillion. Funding 
from all states totaled about $130B. Additionally the top 800+ universities managed endowments 
of approximately $400B in 2013 (Finance: Almanac 2013 2013). If one calculates the annual revenue 
from tuition of engineering undergraduate students in the US, the total comes to about $10B (Cheville 
2012). In comparison NSF’s EHR Directorate was funded at about $830M in FY2013 with the Engineer¬ 
ing Education & Centers Division in the ENG Directorate at about $115M, of which a large part went 
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to Engineering Research Centers. The 2011 Committee on STEM report found that in 2010 $1.1T or 
about 8% of GDP went to education overall. About 40% of this education spending was in the post¬ 
secondary sector of which agencies such as NSF contributed 0.2% (Committee on STEM Education 
2011). Given these dollar values any one award can have miniscule direct impact on measures that 
policy makers care about—jobs, competitiveness, economic growth, or the number of engineering 
graduates. For this reason funding programs attempt to create synergy between awards such that 
incremental advances build upon each other. NSF funding is the seed which must be nourished by 
other funding sources to grow to a level that can impact policy decisions. 

The costs of data sharing are in setting up a system to turn data into knowledge, researchers’ time, 
and creating a culture within the community that values building a coherent body of knowledge. The way 
the system was set up would dictate to a large extent the cost of creating actionable insights from the 
data. The benefits if data sharing were widely adopted in a way that data provenance was meaningfully 
addressed anc/resources were available to draw stories from this data would be to help program officers 
tell the story of how awards in their portfolio seeded larger efforts. While annual progress reports and 
Pl-submitted highlights are intended to capture such impacts, generally reporting ends with the award 
period before results can achieve wide impact (Stephan 2012). Furthermore award-centric stories may 
fail to capture connections between investments. Bibliometric research techniques primarily capture 
publications and miss other impacts that are important to policy makers. Newer techniques such as 
altimetries (Priem, Groth, and Taraborelli) are seeking to capture other measures such as mentions in 
social media, but their limits and veracity are still being tested (c.f. Information Science Quarterly, vol. 
25, issue 2). It is worth mentioning that much knowledge of how the findings of many small projects 
compound over time currently resides in the institutional memory of career program officers who often 
have strong social networks in policy circles. When such career program officers retire or move to other 
research programs it can have a long-term effect on a field. Should the institutional culture at NSF change 
in a way that impacts the long term retention of program officers, the effects could be far reaching. 

In summary the first value proposition for data sharing is to be able to better connect how what 
was learned in one NSF project influenced other efforts—publications, proposals, internally funded 
reform initiatives—over a period of decades. Heuristics for a data sharing effort that would support 
story telling include capturing NSF award numbers, capturing why data was collected, and develop¬ 
ing capacity for users of data to explain how drawing on previous results led to new insights or col¬ 
laborations. Policy is driven in part by data and the stories that emerge from analyzing that data 2 . If 

2 This paper does not assert that data and stories are the only influence on policy decisions. Policy is driven by factors such 
as public opinion, economic conditions, and interest groups as well as networks and relationships (Marsh and Smith 2000). Data 
plays a role in supporting these factors however, and can influence policy either for good—e.g. data on highway fatalities influenc¬ 
ing transportation safety—or for ill as Robert McNamara’s emphasis on body counts in the Vietnam War shows. 
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engineering education is to meaningfully share data, it is worth considering up front ways that make 
it easier for such stories to be written. A related effort is developing new ways of representing data 
that can help brand engineering education in a way the goals of the research community can be 
consistently messaged to policy makers. 

Value Proposition #2: Ontologies and Grand Challenges 

The second value proposition is that by learning how to share data and debating what constitutes 
“data”, engineering educators will become better at developing more consistent terminology and 
begin to self-organize around a set of important problems. This is clearly a second order effect, and I 
am admittedly optimistic to believe data sharing will result in this outcome. However my experience 
in managing a relatively small funding program is that developing a consistent vocabulary, precision 
in underlying concepts, and clear goals are becoming increasingly important to sustain Federal invest¬ 
ments. In the current political climate of trimming government spending it is important for programs 
to clearly articulate both how they align with national priorities and differ from similar efforts. Re¬ 
member that policy is a process used to guide decisions to achieve “better” (from some perspective) 
outcomes than would otherwise occur, and the appearance of redundancy is carefully scrutinized. The 
costs arise from the effort needed to develop more conceptually precise definitions and terminology. 

A recent example of how high level policy decisions can affect funding programs and the re¬ 
search communities they support is the Committee on STEM (CoSTEM) report from the Office of 
Science and Technology Policy (OSTP) in the White House (Committee on STEM Education 2011). 
The report was the result of a cross-agency survey of STEM education funding and led to the re¬ 
lease of a strategic plan in 2012 (Federal Coordination in STEM Education Task Force- Committee 
on STEM Education 2012). The plan is too comprehensive to discuss in detail, but a major theme is 
coordinating STEM education investments across the government. The strategic plan focuses heav¬ 
ily on evaluation, common metrics, milestones, and synthesizing practices. Four “Strategic Federal 
Coordination Objectives” were developed: 

• Use evidence-based approaches. 

• Identify and share evidence-based approaches. 

• Increase efficiency and coherence. 

• Identify and focus on priority areas. 

To achieve these objectives agencies are expected to develop roadmaps—hopefully in conjunction 
with research communities—to address the priority areas. Note that the espoused values of such 
reports do not always align with what is enacted in the policy realm. 

The process of implementing data sharing and the discussions exemplified by this special edi¬ 
tion can enhance the ability of engineering education to share ideas. In considering the merits and 
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disadvantages of data sharing it is important to ask what distinguishes engineering education from 
the rest of the STEM ecosystem, and what priority goals can our community agree upon? These are 
contentious questions, but the needs identified by policymakers are only as good as the information 
they receive. As an example relevant to engineering education one of the priority areas stemming 
from the fourth coordination objective is improving retention of students in the first two years in un¬ 
dergraduate STEM education. There is some evidence that in engineering getting students to transfer 
in to the major is a larger issue than retention (Atman et al. 2010); this is an example of how research 
might be used to inform changes to existing policies. 

The second value proposition is that the act of creating meaningful ways to share data in itself 
creates value. One heuristic for data sharing is to develop a clear and simple language and a set 
of organizing ideas. A second heuristic is to support discussions on what constitutes valid data in 
engineering education, how data is tagged, and what minimal set of metadata is needed for context. 
There is increasing use of data mining tools within NSF to understand investments within a program, 
across divisions, directorates, and all of NSF. It is easier for a program officer, or their proxy, to use 
these tools to extract meaningful insights when a research community agrees on what constitutes 
data and uses consistent terminology. Thus a third heuristic is to design data sharing in a way that 
users are able to easily perform queries that can inform policy-relevant questions. 

Value Proposition #3: Creating Networks of Researchers 

The author Parker Palmer shares a story about a government employee at the Bureau of Land 
Management who came to one of his retreats conflicted about his career (Palmer 2004). A new 
political appointee was pushing policies that the man believed undermined BLM’s mission, and this 
created stress. In the silent space created at the retreat the man realized his conflicted state arose 
because he was not sure who he served. When he realized that regardless of whom he worked 
for he ultimately served the land, the conflict evaporated. This question of who I served was one 
that was much on my mind as a program officer, and one that I never fully resolved. Clearly while 
improving engineering education serves national policy needs, the field cannot advance without 
the efforts of many individuals whose careers are greatly impacted by NSF funding decisions. With 
the increasing emphasis on evaluating the products produced by funding programs, it is important 
not to overlook the role that such programs play in the development of new, and support of estab¬ 
lished, researchers. The impact of grants includes people, particularly graduate students and young 
faculty, as well as publications and insights. There is a clear long-term benefit that arises from a 
networked, cooperative community of Pis focused on a set of problems that is diverse enough to 
generate new insights yet coherent enough to articulate the impact it is having on national needs. 
Engineering education draws from at least two disciplines—engineering and education—that have 
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different cultures; the community is developing its own theories, norms, and values and needs to 
be supported in these efforts. 

Thus the third value proposition is that data sharing can create a better networked engineering 
education community. Depending on how data sharing systems are implemented, it is not just the 
results of research that are shared, but also knowledge of who knows what and how they came to 
develop that expertise. Given the breadth of a program officer’s purview, this information is of great 
value to determine emerging areas, whether a community is sizable enough that a new funding 
program is needed, or simply to suggest collaborations to visiting Pis. 

A heuristic in designing data sharing is to make any system people- as well as data-centered. 
While privacy concerns limit adoption in the US, research reporting systems in some countries 
specifically have adopted a researcher-based focus. An example is the LATTES system in Bra¬ 
zil (Pacheco 2012). LATTES is managed by the Brazilian science funding agency, the Brazilian 
National Council for Scientific and Technological Development (CNPq). Anyone who receives 
funding from CNPq, including students funded by an award, must have a CV on file in the LATTES 
system and update their CV annually. Once CVs are integrated into the LATTES database, they 
can be widely accessed and searched which provides value to researchers seeking to establish 
collaborations. Since LATTES was designed with the intent of supporting research collabora¬ 
tions, it also supports a database of research groups. While it is still an open question of how 
best to evaluate the impacts of investments in research, there are significant efforts underway 
in many countries to improve the science of science policy (Lane 2009). LATTES is one model 
that collects data from CVs and thus is organized around individuals in the research community. 
The similar effort in the US, STAR METRICS (U. S. Department of Health and Human Services 
2014), uses research awards as the central unit of analysis. It will be interesting to see how each 
system evolves. 


FUTURE WORK ON METHODOLOGICAL APPROACHES 


One of the questions that frame the issue of data sharing in engineering education is whether 
new methodological approaches are needed for handling data at scale. The question as it is phrased 
implies that there is need to handle data at large scales and that if new methods were to be devel¬ 
oped some value would emerge from them. I would like to reframe this question by challenging the 
concept of scale. In engineering education “large scale” might be defined in terms of understanding 
the system that educates future engineers. The number of engineering undergraduates entering and 
leaving US universities in any given year is on the order of 10 s at roughly 10 3 institutions. Including 
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technical programs at 2 year colleges adds a similar amount (Cheville 2012). In terms of research, 
the annual number of engineering education publications is likely on the order of 10 2 -10 4 , depending 
on how one defines research. In terms of actual numbers these are not particularly large compared 
to the 3x10 9 human DNA base pairs or 10 9 Tweets per week generated by Twitter users. 

The issue in engineering education is not the scale of data but that making sense from the data 
depends on local contexts and environments that are best understood through more qualitative 
techniques. Because creating large scale data generally requires some form of aggregation which 
removes these critical local variations, the problem I saw as a program officer was not the size of 
the dataset, but rather how to meaningfully connect data across scales of the education system. 
This is roughly illustrated in Figure 1. At the finest granularity, i.e. most central level, are individual 
students that can be aggregated to provide data on retention or graduation rate. As discussed 
previously, Federal funding is not sufficient to affect a sufficient number of students to signifi¬ 
cantly impact these numbers. However, students are affected by more coarse grained elements of 
the system such as what is taught (content), how it is taught (pedagogy), and the overall learning 
environment (education system) where ideas from research can ideally impact practice. The arrow 
illustrates that one further has to choose between certainty of impact or investing at scale. In other 
words investing in people, for example by providing scholarships or supporting REU programs, 
has high probability of impact but is not scalable to the entire education system. That is not to say 
these programs are ineffective; people are not ants and individuals can do great good in the world. 
More systemic investments, e.g. researching learning, have the potential to impact the system as a 



Figure 1. Illustration of different scales of the education system. 
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whole, but impact is not as certain and thus much more difficult to evaluate. There is a critical need 
in engineering education to develop techniques that help understand the system as a whole, i.e. 
connect rich, local qualitative data with large scale quantitative data. While some initial efforts are 
underway (Lemke and Sabelli 2008) more work is needed. 

There are other, more subtle issues related to large scale data. One is that heavily investing in 
capabilities to explore large scale data may trigger an inverse form of the “law of small numbers” 
fallacy and lead to a belief that large population trends are representative of individuals. While 
questions of overall number are important to policy makers, the benefits of education accrue both 
to society and to individuals. These benefits are inextricably linked and not easily separable. One 
cannot lose sight of the individual because ultimately it is individuals who are educated and who 
collectively make up society. Furthermore because a few individuals can have a collective effect 
on the education system focusing only on large scales makes it is possible to overlook important 
impacts of investments. 

A second issue that arises from such individual-collective tensions is related to what student data 
informs policy decisions. In other words, if researchers are not extremely careful about clarifying 
which groups of students give rise to trends in large scale data sets then some groups may be ad¬ 
versely affected by policy decisions. Jaron Lanier in Who Owns the Future explored issues of who 
owns the rights to data drawn from an ensemble of individuals and suggested an economic model 
of micropayments when data is used (Lanier 2013). A micropayment model is likely not feasible for 
engineering education but the questions Lanier raises are relevant to metadata, privacy, and the 
importance of maintaining sight of one individual in large data sets. 


PARTING THOUGHTS 


It is important to realize that data sharing is not an end in itself, but a means to an end. Much 
data is already shared; the current reality is that all entities in the public and private sectors are 
increasingly using sophisticated tools to analyze large data sets to further their missions. Thus I 
believe the question is not whether we should or should not share data, or how such data sharing 
should occur, but how data sharing can best serve the greater good or what end shared data should 
advance? The question of the greater good is clearly a philosophical one, but one that I maintain is 
becoming increasingly important for engineering education. 

To close the loose end that began this paper, the quote at the beginning of this article came 
from a report given to me by an NSF colleague entitled “Changing America: The New Face of 
Science and Engineering ” published about a quarter century ago, in 1988 (The Task Force on 
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Women 1989). If the quote seems relevant today, if we are addressing the same policy issues as 
we were a quarter century ago, then a conversation about ends, about what greater good we 
are working towards, is likely in order. Obtaining one’s own definition of the greater good is not 
something borrowed from policy reports but rather developed by a lifetime of questioning our 
underlying assumptions and schemas. Data sharing is one way to help us, as a community, learn 
to ask the right questions. The questions we ask and the answers we obtain can inform the policy 
decisions that affect our community. But these insights need to be supported by data that also 
gives us insights into the complex workings of the education system in a way that does not lose 
sight of the individual. 

The ideas developed in this paper were developed in part during the time the author served at 
NSF and was supported under IPA funding. Any opinions, findings, and conclusions or recommen¬ 
dations expressed in this material are those of the author and do not necessarily reflect the views 
of the National Science Foundation. 
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